Aside

Download a PDF of this CV

Contact

Language Skills

R
Javascript (d3.js)
C++
Python
Bash
SQL
AWK

Disclaimer

Made with the R package pagedown.

The source code is available on github.com/nstrayer/cv.

Last updated on 2022-12-22.

Main

Shiya Liu

I have made visualizations viewed by hundreds of thousands of people, sped up query times for 25 terabytes of data by an average of 4,800 times, and built packages for R that let you do magic.

Education

PhD. Candidate, Biostatistics

Vanderbilt University

Nashville, TN

2020 - 2015

  • Focused on network models & interactive visualization platforms for electronic health records data
  • University Graduate Fellow

B.S., Mathematics, Statistics (minor C.S.)

University of Vermont

Burlington, VT

2015 - 2011

  • Thesis: An agent based model of Diel Vertical Migration patterns of Mysis diluviana

Research Experience

Graduate Research Assistant

TBILab (Yaomin Xu’s Lab)

Vanderbilt University

Current - 2015

  • Primarily working with large EHR and Biobank datasets.
  • Developing network-based methods to investigate and visualize clinically relevant patterns in data.

Data Science Researcher

Data Science Lab

Johns Hopkins University

2018 - 2017

  • Building R Shiny applications in the contexts of wearables and statistics education.
  • Work primarily done in R Shiny and Javascript (node and d3js).

Undergraduate Researcher

Rubenstein Ecosystems Science Laboratory

University of Vermont

2015 - 2013

  • Analyzed and visualized data for CATOS fish tracking project.
  • Head of data mining project to establish temporal trends in population densities of Mysis diluviana (Mysis).
  • Ran project to mathematically model the migration patterns of Mysis (honors thesis project.)

Human Computer Interaction Researcher

LabInTheWild (Reineke Lab)

University of Michigan

2015 - 2015

  • Led development and implementation of interactive data visualizations to help users compare themselves to other demographics.

Undergraduate Researcher

Bentil Laboratory

University of Vermont

2014 - 2013

  • Developed mathematical model to predict the transport of sulfur through the environment with applications in waste cleanup.

Research Assistant

Adair Laboratory

University of Vermont

2013 - 2012

  • Independently analyzed and constructed statistical models for large data sets pertaining to carbon decomposition rates.

Industry Experience

I have worked in a variety of roles ranging from journalist to software engineer to data scientist. I like collaborative environments where I can learn from my peers.

Software Engineer

RStudio

Remote

Current - 2020

  • Helping make programming web applications with R easier and more beautiful on the Shiny team

Data Journalist - Graphics Department

New York Times

New York, New York

2016 - 2016

  • Reporter with the graphics desk covering topics in science, politics, and sport.
  • Work primarily done in R, Javascript, and Adobe Illustrator.

Engineering Intern - User Experience

Dealer.com

Burlington, VT

2015 - 2015

  • Built internal tool to help analyze and visualize user interaction with back-end products.

Data Science Intern

Dealer.com

Burlington, VT

2015 - 2015

  • Worked with the product analytics team to help parse and visualize large stores of data to drive business decisions.

Data Artist In Residence

Conduce

Carpinteria, CA

2015 - 2014

  • Envisioned, prototyped and implemented visualization framework in the course of one month.
  • Constructed training protocol for bringing third parties up to speed with new protocol.

Software Engineering Intern

Conduce

Carpinteria, CA

2014 - 2014

  • Incorporated d3.js to the company’s main software platform.




Teaching Experience

I am passionate about education. I believe that no topic is too complex if the teacher is empathetic and willing to think about new methods of approaching task.

Javascript for Shiny Users

RStudio::conf 2020

N/A

2020

Data Visualization Best Practices

DataCamp

N/A

2019 - 2019

  • Designed from bottom up course to teach best practices for scientific visualizations.
  • Uses R and ggplot2.
  • In top 10% on platform by popularity.

Improving your visualization in Python

DataCamp

N/A

2019 - 2019

  • Designed from bottom up course to teach advanced methods for enhancing visualization.
  • Uses python, matplotlib, and seaborn.

Advanced Statistical Learning and Inference

Vanderbilt Biostatistics Department

Nashville, TN

2018 - 2017

  • TA and lectured
  • Topics covered from penalized regression to boosted trees and neural networks
  • Highest level course offered in department

Advanced Statistical Computing

Vanderbilt Biostatistics Department

Nashville, TN

2018 - 2018

  • TA and lectured
  • Covered modern statistical computing algorithms
  • 4th year PhD level class

Statistical Computing in R

Vanderbilt Biostatistics Department

Nashville, TN

2017 - 2017

  • TA and lectured
  • Covered introduction to R language for statistics applications
  • Graduate level class

Selected Data Science Writing

I regularly blog about data science and visualization on my blog LiveFreeOrDichotomize.

Using AWK and R to Parse 25tb

LiveFreeOrDichotomize.com

N/A

2019

  • Story of parsing large amounts of genomics data.
  • Provided advice for dealing with data much larger than disk.
  • Reached top of HackerNews.

Classifying physical activity from smartphone data

RStudio Tensorflow Blog

N/A

2018

  • Walk through of training a convolutional neural network to achieve state of the art recognition of activities from accelerometer data.
  • Contracted article.

The United States of Seasons

LiveFreeOrDichotomize.com

N/A

2018

  • GIS analysis of weather data to find the most ‘seasonal’ locations in United States
  • Used Bayesian regression methods for smoothing sparse geospatial data.

A year as told by fitbit

LiveFreeOrDichotomize.com

N/A

2017

  • Analyzing a full years worth of second-level heart rate data from wearable device.
  • Demonstrated visualization-based inference for large data.

MCMC and the case of the spilled seeds

LiveFreeOrDichotomize.com

N/A

2017

  • Full Bayesian MCMC sampler running in your browser.
  • Coded from scratch in vanilla Javascript.

The Traveling Metallurgist

LiveFreeOrDichotomize.com

N/A

2017

  • Pure javascript implementation of traveling salesman solution using simulated annealing.
  • Allows reader to customize the number and location of cities to attempt to trick the algorithm.

Selected Press (About)

Great paper? Swipe right on the new <U+2018>Tinder for preprints<U+2019> app

Science

N/A

2017 - 2017

  • Story of the app Papr made with Jeff Leek and Lucy D<U+2019>Agostino McGowan.

Swipe right for science: Papr app is <U+2018>Tinder for preprints<U+2019>

Nature News

N/A

2017 - 2017

  • Second press article for app Papr.

The Deeper Story in the Data

University of Vermont Quarterly

N/A

2016 - 2016

  • Story on my path post graduation and the power of narrative.



Selected Press (By)

The Great Student Migration

The New York Times

N/A

2016 - 2016

  • Most shared and discussed article from the New York Times for August 2016.

Wildfires are Getting Worse, The New York Times

The New York Times

N/A

2016 - 2016

  • GIS analysis and modeling of fire patterns and trends
  • Data in collaboration with NASA and USGS

Who<U+2019>s Speaking at the Democratic National Convention?

The New York Times

N/A

2016 - 2016

  • Data scraped from CSPAN records to figure out who talked and past conventions.

Who<U+2019>s Speaking at the Republican National Convention?

The New York Times

N/A

2016 - 2016

  • Used same data scraping techniques as Who<U+2019>s Speaking at the Democratic National Convention?

A Trail of Terror in Nice, Block by Block

The New York Times

N/A

2016 - 2016

  • Led research effort to put together story of 2016 terrorist attack in Nice, France in less than 12 hours.
  • Work won Silver medal at Malofiej 2017, and gold at Society of News and Design.

Selected Publications, Posters, and Talks

Building a software package in tandem with machine learning methods research can result in both more rigorous code and more rigorous research

ENAR 2020

N/A

2020

  • Invited talk in Human Data Interaction section.
  • How and why building an R package can benefit methodological research

Stochastic Block Modeling in R, Statistically rigorous clustering with rigorous code

RStudio::conf 2020

N/A

2020

  • Invited talk about new sbmR package.
  • Focus on how software development and methodological research can improve both benefit when done in tandem.

PheWAS-ME: A web-app for interactive exploration of multimorbidity patterns in PheWAS

Bioinformatics

N/A

2020

  • Manuscript detailing application for the exploration of multimorbidity patterns in PheWAS analyses
  • See landing page for more information.

Charge Reductions Associated with Shortening Time to Recovery in Septic Shock

Chest

N/A

2019 - 2019

  • Authored with Wesley H. Self, MD MPH; Dandan Liu, PhD; Stephan Russ, MD, MPH; Michael J. Ward, MD, PhD, MBA; Nathan I. Shapiro, MD, MPH; Todd W. Rice, MD, MSc; Matthew W. Semler, MD, MSc.

Multimorbidity Explorer | A shiny app for exploring EHR and biobank data

RStudio::conf 2019

N/A

2019 - 2019

  • Contributed Poster. Authored with Yaomin Xu.

Taking a network view of EHR and Biobank data to find explainable multivariate patterns

Vanderbilt Biostatistics Seminar Series

N/A

2019 - 2019

  • University wide seminar series.

Patient-specific risk factors independently influence survival in Myelodysplastic Syndromes in an unbiased review of EHR records

Under-Review (copy available upon request.)

N/A

2019

  • Bayesian network analysis used to find novel subgroups of patients with Myelodysplastic Syndromes (MDS).
  • Analysis done using method built for my dissertation.

Patient specific comorbidities impact overall survival in myelofibrosis

Under-Review (copy available upon request.)

N/A

2019

  • Bayesian network analysis used to find robust novel subgroups of patients with given genetic mutations.
  • Analysis done using method built for my dissertation.

R timelineViz: Visualizing the distribution of study events in longitudinal studies

Under-Review (copy available upon request.)

N/A

2018 - 2018

  • Authored with Alex Sunderman of the Vanderbilt Department of Epidemiology.

Continuous Classification using Deep Neural Networks

Vanderbilt Biostatistics Qualification Exam

N/A

2017 - 2017

  • Review of methods for classifying continuous data streams using neural networks
  • Successfully met qualifying examination standards

Asymmetric Linkage Disequilibrium: Tools for Dissecting Multiallelic LD

Journal of Human Immunology

N/A

2015 - 2015

  • Authored with Richard Single, Vanja Paunic, Mark Albrecht, and Martin Maiers.

An Agent Based Model of Mysis Migration

International Association of Great Lakes Research Conference

N/A

2015 - 2015

  • Authored with Brian O’Malley, Sture Hansson, and Jason Stockwell.

Declines of Mysis diluviana in the Great Lakes

Journal of Great Lakes Research

N/A

2015 - 2015

  • Authored with Peter Euclide and Jason Stockwell.